Overview

Dataset Statistics

Number of Variables 8
Number of Rows 112650
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 40.3 MB
Average Row Size in Memory 375.0 B
Variable Types
  • Numerical: 4
  • Categorical: 4

Dataset Insights

index is uniformly distributed Uniform
order_item_id is skewed Skewed
price is skewed Skewed
freight_value is skewed Skewed
order_id has a high cardinality: 98666 distinct values High Cardinality
product_id has a high cardinality: 32951 distinct values High Cardinality
seller_id has a high cardinality: 3095 distinct values High Cardinality
shipping_limit_date has a high cardinality: 93318 distinct values High Cardinality
order_id has constant length 32 Constant Length
product_id has constant length 32 Constant Length
seller_id has constant length 32 Constant Length
shipping_limit_date has constant length 19 Constant Length
  • 1
  • 2

Variables


index

numerical

Approximate Distinct Count 112650
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1802400
Mean 56324.5
Minimum 0
Maximum 112649
Zeros 1
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • index is uniformly distributed

Quantile Statistics

Minimum 0
5-th Percentile 5632.45
Q1 28162.25
Median 56324.5
Q3 84486.75
95-th Percentile 107016.55
Maximum 112649
Range 112649
IQR 56324.5

Descriptive Statistics

Mean 56324.5
Standard Deviation 32519.3982
Variance 1.0575e+09
Sum 6.345e+09
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.5774

order_id

categorical

Approximate Distinct Count 98666
Approximate Unique (%) 87.6%
Missing 0
Missing (%) 0.0%
Memory Size 10927050

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 00010242fe8c5a6d1b...
2nd row 00018f77f2f0320c55...
3rd row 000229ec398224ef6c...
4th row 00024acbcdf0a6daa1...
5th row 00042b26cf59d7ce69...

Letter

Count 1351833
Lowercase Letter 1351833
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 2252967
  • order_id contains many words: 98666 words
  • order_id has words of constant length

order_item_id

numerical

Approximate Distinct Count 21
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1802400
Mean 1.1978
Minimum 1
Maximum 21
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • order_item_id is skewed right (γ1 = 7.5803)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 1
Median 1
Q3 1
95-th Percentile 2
Maximum 21
Range 20
IQR 0

Descriptive Statistics

Mean 1.1978
Standard Deviation 0.7051
Variance 0.4972
Sum 134936
Skewness 7.5803
Kurtosis 103.8527
Coefficient of Variation 0.5887
  • order_item_id is not normally distributed (p-value 7.287553100579732e-25)
  • order_item_id has 13984 outliers

product_id

categorical

Approximate Distinct Count 32951
Approximate Unique (%) 29.2%
Missing 0
Missing (%) 0.0%
Memory Size 10927050

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 4244733e06e7ecb497...
2nd row e5f2d52b802189ee65...
3rd row c777355d18b72b67ab...
4th row 7634da152a4610f159...
5th row ac6c3623068f30de03...

Letter

Count 1342019
Lowercase Letter 1342019
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 2262781
  • product_id contains many words: 32951 words
  • product_id has words of constant length

seller_id

categorical

Approximate Distinct Count 3095
Approximate Unique (%) 2.7%
Missing 0
Missing (%) 0.0%
Memory Size 10927050

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 48436dade18ac8b2bc...
2nd row dd7ddc04e1b6c2c614...
3rd row 5b51032eddd242adc8...
4th row 9d7a1d34a505240900...
5th row df560393f3a51e7455...

Letter

Count 1325194
Lowercase Letter 1325194
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 2279606
  • seller_id contains many words: 3095 words
  • seller_id has words of constant length

shipping_limit_date

categorical

Approximate Distinct Count 93318
Approximate Unique (%) 82.8%
Missing 0
Missing (%) 0.0%
Memory Size 9462600

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2017-09-19 09:45:3...
2nd row 2017-05-03 11:05:1...
3rd row 2018-01-18 14:48:3...
4th row 2018-08-15 10:10:1...
5th row 2017-02-13 13:57:5...

Letter

Count 0
Lowercase Letter 0
Space Separator 112650
Uppercase Letter 0
Dash Punctuation 225300
Decimal Number 1577100
  • shipping_limit_date contains many words: 40685 words
  • The largest value (20171130) is over 2.19 times larger than the second largest value (20171207)
  • shipping_limit_date has words of constant length

price

numerical

Approximate Distinct Count 5968
Approximate Unique (%) 5.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1802400
Mean 120.6537
Minimum 0.85
Maximum 6735
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • price is skewed right (γ1 = 7.9231)

Quantile Statistics

Minimum 0.85
5-th Percentile 17
Q1 39.9
Median 74.99
Q3 134.9
95-th Percentile 349.9
Maximum 6735
Range 6734.15
IQR 95

Descriptive Statistics

Mean 120.6537
Standard Deviation 183.6339
Variance 33721.4195
Sum 1.3592e+07
Skewness 7.9231
Kurtosis 120.8229
Coefficient of Variation 1.522
  • price is not normally distributed (p-value 5.583927900883129e-24)
  • price has 8427 outliers

freight_value

numerical

Approximate Distinct Count 6999
Approximate Unique (%) 6.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1802400
Mean 19.9903
Minimum 0
Maximum 409.68
Zeros 383
Zeros (%) 0.3%
Negatives 0
Negatives (%) 0.0%
  • freight_value is skewed right (γ1 = 5.6398)

Quantile Statistics

Minimum 0
5-th Percentile 7.78
Q1 13.08
Median 16.26
Q3 21.15
95-th Percentile 45.12
Maximum 409.68
Range 409.68
IQR 8.07

Descriptive Statistics

Mean 19.9903
Standard Deviation 15.8064
Variance 249.8425
Sum 2.2519e+06
Skewness 5.6398
Kurtosis 59.7855
Coefficient of Variation 0.7907
  • freight_value is not normally distributed (p-value 1.5744345594471989e-18)
  • freight_value has 12134 outliers

Interactions

Correlations

Missing Values